Application of Different Learning Methods to Hungarian Part-of-Speech Tagging

نویسندگان

Tamás Horváth

Zoltán Alexin

Tibor Gyimóthy

Stefan Wrobel

چکیده

From the point of view of computational linguistics, Hungar-ian is a diicult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hun-garian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large diierences in accuracy , and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some diierent cascade connections of the taggers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lessons learned from tagging clinical Hungarian

As more and more textual resources from the medical domain are getting accessible, automatic analysis of clinical notes becomes possible. Since part-of-speech tagging is a fundamental part of any text processing chain, tagging tasks must be performed with high accuracy. While there are numerous studies on tagging medical English, we are not aware of any previous research examining the same fiel...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

Application of AGLEARN for HungarianPart - of - speech

In this paper we present an application of the AGLEARN method to the part-of-speech (POS) tagging of Hungarian sentences. The task of the AGLEARN is to infer the semantic functions associated with the productions. In the learning process the grammar, background semantic functions and examples can be used. We applied the AGLEARN method to infer context rules to choose the correct tags. A corpus ...

متن کامل

Manually Annotated Hungarian Corpus

Current paper presents the results of a two-year project during which a consortium of the University of Szeged and the MorphoLogic Ltd. Budapest developed a morpho-syntactically parsed and annotated (disambiguated) corpus for Hungarian. For morpho-syntactic encoding, the Hungarian version of MSD (MorphoSyntactic Description) has been used. The corpus contains texts of five different topic areas...

متن کامل

Using a morphological analyzer in high precision POS tagging of Hungarian

The paper presents an evaluation of maxent POS disambiguation systems that incorporate an open source morphological analyzer to constrain the probabilistic models. The experiments show that the best proposed architecture, which is the first application of the maximum entropy framework in a Hungarian NLP task, outperforms comparable state of the art tagging methods and is able to handle out of v...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Application of Different Learning Methods to Hungarian Part-of-Speech Tagging

نویسندگان

چکیده

منابع مشابه

Lessons learned from tagging clinical Hungarian

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Application of AGLEARN for HungarianPart - of - speech

Manually Annotated Hungarian Corpus

Using a morphological analyzer in high precision POS tagging of Hungarian

عنوان ژورنال:

اشتراک گذاری